contributed by Danyao Jin, Shannon Ma, Xinyi Li and Zifan Wang
We used a collected data from 5.7 million flights from January 2017 to December 2017 from the website of the BTS website. The data is made available in monthly files, with the option to select the fields for download. The specific data set used in this analysis can be found here.
Flight delay remains to be a huge problem for travelers worldwide. A delay of flight is often defined as arriving or departing more than 15 minutes later than schedule. According to the U.S. Department of Transportation (DOT) Bureau of Transportation Statistics (BTS), 18.15% of all scheduled flights are delayed in U.S. in 2017. Some main causes of delay include late arrival of the last flight, National Aviation System delay (congested airports and air-traffic-jam), and airline’s reasons. 1
As a traveler, what can I do to avoid flight delay or minimize delay time? Which airline performs the best? Does travelling in a busy Monday increase my chance of suffering delay? Which time of a day might reduce the delay time? Do those big airports like John F. Kennedy International airport have the highest delay rates? Do main causes of delay vary according to airlines and airports?
In this project, we used the data of U.S. domestic flight in 2017 from Bureau of Transportation Statistics to explore factors influencing flight delay rate and delay time, construct regression model for delay prediction, and come up advices for choosing a best plan for a given route.
Which factors influence the probability of flight delay and delay time? Some factors we considered as important and available from the database were airlines, day of week, time of day, airports, regions and routes. To how much extent do they influence the probability of flight delay and delay time? During the process, we noticed that the impact of weekday and daytime might vary according to airlines. We also conducted analyses stratified on airlines.
The U.S. Department of Transportation (DOT) Bureau of Transportation Statistics (BTS) collected detailed information of carrier on-time performance, including each flight information, delay time, and delay reason. We downloaded the dataset from their website 3. We confined our research question to the latest year (2017) since a larger database could not be run on our computers.
library(tidyverse)
library(dplyr)
library(dslabs)
library(readr)
library(ggthemes)
library(RColorBrewer)
library(shiny)
library(plotly)
library(splitstackshape)
library(lsmeans)
library(rsconnect)
#read in dataset "flight2017.csv"
dat <- read_csv("C:/Users/jindanyao/Desktop/2018fall/2018fall/BST260/final project/database/flight2017.csv")
# check missing values:
colSums(is.na(dat))
## X1 YEAR QUARTER
## 0 0 0
## MONTH DAY_OF_MONTH DAY_OF_WEEK
## 0 0 0
## FL_DATE OP_UNIQUE_CARRIER OP_CARRIER_FL_NUM
## 0 0 0
## ORIGIN ORIGIN_CITY_NAME ORIGIN_STATE_ABR
## 0 0 0
## DEST DEST_CITY_NAME DEST_STATE_ABR
## 0 0 0
## CRS_DEP_TIME DEP_TIME DEP_DELAY_NEW
## 0 80308 80343
## DEP_DEL15 DEP_DELAY_GROUP CRS_ARR_TIME
## 80343 80343 0
## ARR_TIME ARR_DELAY_NEW ARR_DEL15
## 84674 95211 95211
## ARR_DELAY_GROUP CANCELLED CANCELLATION_CODE
## 95211 0 5591928
## DISTANCE CARRIER_DELAY WEATHER_DELAY
## 0 4645148 4645148
## NAS_DELAY SECURITY_DELAY LATE_AIRCRAFT_DELAY
## 4645148 4645148 4645148
## X
## 5674621
There are no missings for the year, quarter, month, day of month, day of weak, date, origin or destination city/states of the flights. There are 80,343 missing values for departure delay times, the main outcome of interest in our study. We assume that the missing values are due to flight cancellation. For this study, we will look at the flights with non-missing departure delay times (missing values will be automatically excluded from the plots or regression models.
We also look at the distributions of the relevant variables
summary(dat)
## X1 YEAR QUARTER MONTH
## Min. : 1 Min. :2017 Min. :1.000 Min. : 1.000
## 1st Qu.:1418656 1st Qu.:2017 1st Qu.:2.000 1st Qu.: 4.000
## Median :2837311 Median :2017 Median :3.000 Median : 7.000
## Mean :2837311 Mean :2017 Mean :2.516 Mean : 6.546
## 3rd Qu.:4255966 3rd Qu.:2017 3rd Qu.:3.000 3rd Qu.: 9.000
## Max. :5674621 Max. :2017 Max. :4.000 Max. :12.000
##
## DAY_OF_MONTH DAY_OF_WEEK FL_DATE OP_UNIQUE_CARRIER
## Min. : 1.00 Min. :1.00 Min. :2017-01-01 Length:5674621
## 1st Qu.: 8.00 1st Qu.:2.00 1st Qu.:2017-04-05 Class :character
## Median :16.00 Median :4.00 Median :2017-07-03 Mode :character
## Mean :15.76 Mean :3.94 Mean :2017-07-02
## 3rd Qu.:23.00 3rd Qu.:6.00 3rd Qu.:2017-09-29
## Max. :31.00 Max. :7.00 Max. :2017-12-31
##
## OP_CARRIER_FL_NUM ORIGIN ORIGIN_CITY_NAME
## Min. : 1 Length:5674621 Length:5674621
## 1st Qu.: 736 Class :character Class :character
## Median :1679 Mode :character Mode :character
## Mean :2143
## 3rd Qu.:3064
## Max. :8402
##
## ORIGIN_STATE_ABR DEST DEST_CITY_NAME
## Length:5674621 Length:5674621 Length:5674621
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## DEST_STATE_ABR CRS_DEP_TIME DEP_TIME DEP_DELAY_NEW
## Length:5674621 Min. : 1 Min. : 1 Min. : 0.00
## Class :character 1st Qu.: 912 1st Qu.: 914 1st Qu.: 0.00
## Mode :character Median :1323 Median :1327 Median : 0.00
## Mean :1330 Mean :1334 Mean : 12.83
## 3rd Qu.:1735 3rd Qu.:1743 3rd Qu.: 6.00
## Max. :2359 Max. :2400 Max. :2755.00
## NA's :80308 NA's :80343
## DEP_DEL15 DEP_DELAY_GROUP CRS_ARR_TIME ARR_TIME
## Min. :0.00 Min. :-2.00 Min. : 1 Min. : 1
## 1st Qu.:0.00 1st Qu.:-1.00 1st Qu.:1103 1st Qu.:1050
## Median :0.00 Median :-1.00 Median :1520 Median :1510
## Mean :0.18 Mean : 0.03 Mean :1489 Mean :1469
## 3rd Qu.:0.00 3rd Qu.: 0.00 3rd Qu.:1920 3rd Qu.:1918
## Max. :1.00 Max. :12.00 Max. :2359 Max. :2400
## NA's :80343 NA's :80343 NA's :84674
## ARR_DELAY_NEW ARR_DEL15 ARR_DELAY_GROUP CANCELLED
## Min. : 0.00 Min. :0.00 Min. :-2.00 Min. :0.00000
## 1st Qu.: 0.00 1st Qu.:0.00 1st Qu.:-1.00 1st Qu.:0.00000
## Median : 0.00 Median :0.00 Median :-1.00 Median :0.00000
## Mean : 12.84 Mean :0.18 Mean :-0.23 Mean :0.01457
## 3rd Qu.: 7.00 3rd Qu.:0.00 3rd Qu.: 0.00 3rd Qu.:0.00000
## Max. :2189.00 Max. :1.00 Max. :12.00 Max. :1.00000
## NA's :95211 NA's :95211 NA's :95211
## CANCELLATION_CODE DISTANCE CARRIER_DELAY WEATHER_DELAY
## Length:5674621 Min. : 31.0 Min. : 0 Min. : 0
## Class :character 1st Qu.: 391.0 1st Qu.: 0 1st Qu.: 0
## Mode :character Median : 680.0 Median : 1 Median : 0
## Mean : 856.7 Mean : 20 Mean : 3
## 3rd Qu.:1097.0 3rd Qu.: 17 3rd Qu.: 0
## Max. :4983.0 Max. :1934 Max. :1934
## NA's :4645148 NA's :4645148
## NAS_DELAY SECURITY_DELAY LATE_AIRCRAFT_DELAY
## Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
## Median : 2 Median : 0 Median : 4
## Mean : 16 Mean : 0 Mean : 25
## 3rd Qu.: 19 3rd Qu.: 0 3rd Qu.: 31
## Max. :1605 Max. :827 Max. :1756
## NA's :4645148 NA's :4645148 NA's :4645148
## X
## Length:5674621
## Class :character
## Mode :character
##
##
##
##
The minimum departure delay time is 0 (in this dataset, all early departures are set to 0), and the maximum departure delay time is 2755 minutes (i.e. 46 hours). After We checked on TripAdvisor and The Ten Worst Flight Delays In History, we believe that the maximum value of departure delay times in this dataset could be reasonable, so we won’t exclude it. Also, to make sure that we won’t be impacted by potential extreme values, we will not only perform linear regressions but also logistic regressions in our data analysis.
The minimum flight distance is 31 miles, and the longest flight distance is 4983 miles, which are also reasonable (see shortest US flight route from Barnstaple Municipal Airport on Cape Cod to Nantucket Memorial Airport and ~ 4000 miles distance from New York to Hawaii.
month_freq <- table(dat$MONTH)
day_freq <- table(dat$DAY_OF_WEEK)
carrier_freq <- table(dat$OP_UNIQUE_CARRIER)
state <- table(dat$ORIGIN_STATE_ABR)
month_freq <- as.data.frame(month_freq)
day_freq <- as.data.frame(day_freq)
carrier_freq <- as.table(carrier_freq)
state <- as.table(state)
month_freq
## Var1 Freq
## 1 1 450017
## 2 2 410517
## 3 3 488597
## 4 4 468329
## 5 5 486483
## 6 6 494266
## 7 7 509070
## 8 8 510451
## 9 9 458727
## 10 10 479797
## 11 11 454162
## 12 12 464205
day_freq
## Var1 Freq
## 1 1 839772
## 2 2 819499
## 3 3 830854
## 4 4 841765
## 5 5 846443
## 6 6 689412
## 7 7 806876
carrier_freq
##
## AA AS B6 DL EV F9 HA NK OO
## 896348 185068 298654 923560 339541 103027 80172 156818 706527
## UA VX WN
## 584481 70981 1329444
state
##
## AK AL AR AZ CA CO CT FL GA HI
## 36396 22285 13871 172947 756448 244856 22211 457489 377288 104697
## IA ID IL IN KS KY LA MA MD ME
## 13961 22513 362857 40048 9876 34653 66302 127175 101015 6790
## MI MN MO MS MT NC ND NE NH NJ
## 158271 142632 105635 9569 17430 162891 10961 22870 6180 121631
## NM NV NY OH OK OR PA PR RI SC
## 21609 168151 245371 71211 29441 74109 112754 26897 13755 30447
## SD TN TT TX UT VA VI VT WA WI
## 8332 81934 484 557534 115337 143040 5254 3274 153388 50878
## WV WY
## 2124 7549
The frequencies of month, day of week, carrier, and departure state of the flights are all in reasonable range as well.
The dataset is pretty clean, with very few missing values or extreme values. In this project, we excluded all cancelled flights and diverted flights. Delayed flight is delayed for 15 minutes or above.
First, we looked at the delay percentage across 12 airlines in U.S.:
## Warning: package 'bindrcpp' was built under R version 3.5.1
The percentage of delays for most airlines is in the range of 15% to 25%. JetBlue has the highest delay of 27% and Hawaiian Airlines has the lowest flight delay of only 8.4%.
We continue to investigate the average time of delays (among delayed flights) and reasons.
Among flights with delays, you have to wait for 60-80 minutes for most airlines. ExpressJet and SkyWest tend to have longest waiting time of more than 80 minutes. Hawaiian Airlines and Southwest Airlines tends to have shortest time of 50-minute delays.
For the delay reasons, Carrier Delay and Late Arrival Delay is the two main reasons of delay for most airlines. Look, delays due to Weather problems are not as frequent as we we expect! (Carrier: delay due to carrier reasons, such as aircraft cleaning, fueling, maintenance, awaiting the arrival of connecting passengers and baggage. Late: delay due to the late arrival of the same aircraft at previous airport. NAS: Delay due to National Airspace System, such as non-extreme weather condition, heavy traffic volume and air traffic control. Security: Delay caused by evacuation of a terminal or concourse. Weather: Delay caused by extreme weather.)
What time of the day are you most likely to be delayed?
dat0 <- dat %>% select(CRS_DEP_TIME, OP_UNIQUE_CARRIER, DEP_DEL15, DAY_OF_WEEK, DEP_DELAY_NEW)
Generate departure hour:
dat1 <- dat0 %>% mutate(DEP_HOUR=as.integer(as.numeric(CRS_DEP_TIME)/100))
Generate overall delay percentage, overall average delay hours, and delay percentage and average delay hours for each carrier (dat2 for percentage, dat3 for delay hour-only delayed flight included):
dat2<- dat1 %>% filter(!is.na(DEP_DEL15)) %>%
group_by(DEP_HOUR, DAY_OF_WEEK) %>%
mutate(PERCENTAGE_OVERALL=mean(DEP_DEL15))
dat3<- dat1 %>% filter(!is.na(DEP_DELAY_NEW) & DEP_DEL15==1) %>%
group_by(DEP_HOUR) %>%
mutate(AVERAGE_OVERALL=mean(DEP_DELAY_NEW))
dat4 <- dat2 %>% filter(!is.na(DEP_DEL15)) %>%
group_by(OP_UNIQUE_CARRIER, DAY_OF_WEEK, DEP_HOUR) %>%
mutate(PERCENTAGE=mean(DEP_DEL15))
dat5 <- dat3 %>% filter(!is.na(DEP_DELAY_NEW) & DEP_DEL15==1) %>%
group_by(OP_UNIQUE_CARRIER,DEP_HOUR) %>%
mutate(AVERAGE=mean(DEP_DELAY_NEW))
dat6 <- dat1 %>% filter(!is.na(DEP_DELAY_NEW) & DEP_DEL15==1) %>%
group_by(DAY_OF_WEEK,DEP_HOUR) %>%
mutate(AVERAGE_NEW=mean(DEP_DELAY_NEW))
Heatmap for overall delay percentage:
dat2 %>% ggplot(aes(DEP_HOUR, DAY_OF_WEEK, fill = PERCENTAGE_OVERALL)) +
geom_tile(color = "white") +
scale_fill_gradientn(colors = brewer.pal(9, "Reds"),limits=c(0,1)) +
scale_y_continuous(breaks=seq(1,7))+
theme_minimal() +
theme(panel.grid = element_blank()) +
labs(title="Overall delay percentage in each hour and each day of week",y="Day of week",x="Departure hour")
Overall, 2-4am is the time period with the highest delay rate (about 25%). On the contrary, 5-10am is the period with the lowest delay rate (about 10%). Departing in the morning might minimize your probability of encountering flight delay. Delay rate is higher on Thursday, Friday, Sunday and Monday than other days.
Heatmap for overall delay time:
dat6 %>% ggplot(aes(DEP_HOUR, DAY_OF_WEEK, fill = AVERAGE_NEW)) +
geom_tile(color = "white") +
scale_fill_gradientn(colors = brewer.pal(9, "Reds")) +
scale_y_continuous(breaks=seq(1,7))+
theme_minimal() +
theme(panel.grid = element_blank()) +
labs(title="Overall delay time (minutes) in each hour and each day of week",y="Day of week",x="Departure hour")
Delay time is among the highest during at 1-6am and on Friday, Sunday and Monday.
Bar plot for overall average delay hours:
dat3 %>% select(DEP_HOUR,AVERAGE_OVERALL) %>%
unique() %>%
ggplot(aes(DEP_HOUR,AVERAGE_OVERALL))+
geom_bar(stat="identity", fill="#720017")+
labs(title="Overall delay time (minutes) in each hour and each day of week",y="Average delay time",x="Departure hour")
1-2am and 5-8am have relatively high delay time on avergae. According to the plot, the worst choice might be having your flight schedule at 5am, which may leading to average delay time for nearly 90 minutes.
We also designed an application to show your delay probability and delay time on average when you choose your airline and departure hour.
From the plot above, we can see that the effect of departure hour and departure day on delay rate do not vary across different airlines. The general trend is the delay rate is higher on Thursday, Friday, Sunday and Monday than other days. And the delay rate is especially high during 0 to 4am, while 5 to 10am seems to be the safest time period to avoid flight delay. Overall, Hawaiian Airlines performs the best in terms of delay rate, while JetBlue Airlines performs the worst among all 12 carriers in the database. Interstingly, if you choose to take a flight by Spirit Airlines leaving in Saturday 3am, or by United Airlines leaving in Thursday or Friday 4am, or by SkyWest Airlines leaving in 0am during Thursday to Saturday, you are going to suffer a flight delay almost 100% time.
The effect of departure hour on average delay time seems to differ among different airlines. SkyWest Airlines has astonishingly high delay time. Imagine you have a flight by SkyWest sheduled to departing at 0am, you often need to wait more than 1.5 hour for your SkyWest flight. Hawaiian Airlines has the best performance on delay time as most of flight delay time is below 1 hour.
Sys.setenv("plotly_username"="ziwang970")
Sys.setenv("plotly_api_key"="Rh542AcijT2qJ07JZsQY")
Sys.setenv("plotly_username"="tsma29")
Sys.setenv("plotly_api_key"="7VWfMILchgTnOAX2DiZA")
# calculate mean departure delay minutes by state and by Seasons
state_delay_spring <- dat %>% filter(DEP_DEL15==1) %>% filter(MONTH %in% c(3,4,5))%>%
group_by(ORIGIN_STATE_ABR) %>%
summarize(mean_delay = mean(DEP_DELAY_NEW, na.rm = TRUE))
state_delay_summer <- dat %>% filter(DEP_DEL15==1) %>% filter(MONTH %in% c(6,7,8))%>%
group_by(ORIGIN_STATE_ABR) %>%
summarize(mean_delay = mean(DEP_DELAY_NEW, na.rm = TRUE))
state_delay_autumn <- dat %>% filter(DEP_DEL15==1) %>% filter(MONTH %in% c(9,10,11))%>%
group_by(ORIGIN_STATE_ABR) %>%
summarize(mean_delay = mean(DEP_DELAY_NEW, na.rm = TRUE))
state_delay_winter <- dat %>% filter(DEP_DEL15==1) %>% filter(MONTH %in% c(12,1,2))%>%
group_by(ORIGIN_STATE_ABR) %>%
summarize(mean_delay = mean(DEP_DELAY_NEW, na.rm = TRUE))
# give state boundaries white borders
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
scope = 'usa',
projection = list(type = 'albers usa'),
showlakes = TRUE,
lakecolor = toRGB('white')
)
# make the plot
p_spring <- plot_geo(state_delay_spring, locationmode = 'USA-states') %>%
add_trace(
z = ~mean_delay, locations = ~ORIGIN_STATE_ABR,
color = ~mean_delay, colors = 'Reds'
) %>%
colorbar(title = "Departure delay(min) in spring") %>%
layout(
title = '2017 average departure delay (minutes) by states in Spring',
geo = g
)
p_summer <- plot_geo(state_delay_summer, locationmode = 'USA-states') %>%
add_trace(
z = ~mean_delay, locations = ~ORIGIN_STATE_ABR,
color = ~mean_delay, colors = 'Reds'
) %>%
colorbar(title = "Departure delay(min) in summer") %>%
layout(
title = '2017 average departure delay (minutes) by states in Summer',
geo = g
)
p_autumn <- plot_geo(state_delay_autumn, locationmode = 'USA-states') %>%
add_trace(
z = ~mean_delay, locations = ~ORIGIN_STATE_ABR,
color = ~mean_delay, colors = 'Reds'
) %>%
colorbar(title = "Departure delay(min) in autumn") %>%
layout(
title = '2017 average departure delay (minutes) by states in Autumn',
geo = g
)
p_winter <- plot_geo(state_delay_winter, locationmode = 'USA-states') %>%
add_trace(
z = ~mean_delay, locations = ~ORIGIN_STATE_ABR,
color = ~mean_delay, colors = 'Reds'
) %>%
colorbar(title = "Departure delay(min) in winter") %>%
layout(
title = '2017 average departure delay (minutes) by states in Winter',
geo = g
)
p_season <- subplot(p_spring, p_summer, p_autumn, p_winter, nrows = 2) %>%
layout(title = "2017 average departure delay (minutes) by seasons",
xaxis = list(domain=list(x=c(0,0.5),y=c(0,0.5))),
scene = list(domain=list(x=c(0.5,1),y=c(0,0.5))),
xaxis2 = list(domain=list(x=c(0.5,1),y=c(0.5,1))),
annotations = list(
list(x = 0.2 , y = 1, text = "spring", showarrow = F, xref='paper', yref='paper'),
list(x = 0.8 , y = 1, text = "summer", showarrow = F, xref='paper', yref='paper'),
list(x = 0.2 , y = 0.5, text = "autumn", showarrow = F, xref='paper', yref='paper'),
list(x = 0.8 , y = 0.5, text = "winter", showarrow = F, xref='paper', yref='paper'))
)
p_season
Also, seasonal changes can affect flight delay and we want to get an overall view of how delay times are distributed across different regions in four seasons. We can see the states of longest delay time varies across season. Delays are shorter during Fall and longer during Spring and Summer. You can point to each state to check the average delay time in each season.
Which Airports tends to experience more delays? Location is another important factor that can affect flight delay. We are curious about whether flight delays differ significantly among different airports, no matter for weather reasons or heavy traffic volume reasons. Hence, we choose the top 10 busiest airports in US for analysis 4.
Among the top 10 busiest airport in US, the percentage of delays is in the range of 15% to 25%. Newark Liberty International Airport has the highest delay of 25%. George Bush Intercontinental Airport and Washington Dulles International Airport has the lowest flight delays of 15%.
If the flight delays, you have to wait for 60-80 minutes for most airports. Although Washington Dulles International Airport has the lowest percentage of flight delays, it has longest average minutes of delays of more than 90 minutes. John F. Kennedy International Airport ranks second for 88 minutes of average delays. Los Angeles International Airport has the lowest average waiting time of 66 minutes.
Here, we look into more details about the effect of geographical factors on the flight delays. We will use interactive maps to describe the delay patterns in different US states, cities, and by different flight routes.
In this step, we will describe the average delay times of each state:
# calculate mean departure delay minutes by state
state_delay <- dat %>% filter(DEP_DEL15==1) %>%
group_by(ORIGIN_STATE_ABR) %>%
summarize(mean_delay = mean(DEP_DELAY_NEW, na.rm = TRUE))
# give state boundaries white borders
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
scope = 'usa',
projection = list(type = 'albers usa'),
showlakes = TRUE,
lakecolor = toRGB('white')
)
# make the plot
p_state <- plot_geo(state_delay, locationmode = 'USA-states') %>%
add_trace(
z = ~mean_delay, locations = ~ORIGIN_STATE_ABR,
color = ~mean_delay, colors = 'Purples'
) %>%
colorbar(title = "Departure delay in minutes") %>%
layout(
title = '2017 average departure delay (minutes) by states',
geo = g
)
p_state
From the plot, we see that in general, the Northeast region of the US had experienced longer delay times in 2017 (States like Maine or Vermont had average delay times over 20 minutes). For other regions, there seems to be relatively long delay times in the South and the West coast.
We then look at the delay time patterns for each departure city: The delay times are categorized into 4 quartiles and shown by colored bubbles, and the size of the bubbles depicts the length of delay time:
# calculate mean departure delay minutes by city
city_delay <- dat %>% filter(DEP_DEL15==1) %>%
group_by(ORIGIN_CITY_NAME) %>%
summarize(mean_delay = mean(DEP_DELAY_NEW, na.rm = TRUE))
city_delay <- cSplit(city_delay, "ORIGIN_CITY_NAME", sep=",")
city_delay <- city_delay %>% mutate(name = ORIGIN_CITY_NAME_1)
# add the coordination of cities
coordinate <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_us_cities.csv')
city_delay <- city_delay %>% mutate(name = trimws(as.character(name)))
coordinate <- coordinate %>% mutate(name = trimws(as.character(name)))
merged_city_delay <- left_join(city_delay,coordinate, by='name')
merged_city_delay <- merged_city_delay %>%
group_by(name) %>%
summarize(mean_delay = mean(mean_delay, na.rm = TRUE), lat = mean(lat), lon = mean(lon))
# draw the plot by cities
merged_city_delay$q <- with(merged_city_delay, cut(mean_delay, quantile(mean_delay)))
levels(merged_city_delay$q) <- paste(c("1st", "2nd", "3rd", "4th", "5th"), "Quantile")
merged_city_delay$q <- as.ordered((merged_city_delay$q))
g <- list(
scope = 'usa',
projection = list(type = 'albers usa'),
showland = TRUE,
landcolor = toRGB("gray85"),
subunitwidth = 1,
countrywidth = 1,
subunitcolor = toRGB("white"),
countrycolor = toRGB("white")
)
p_cities <- plot_geo(merged_city_delay, locationmode = 'USA-states', sizes = c(1, 250)) %>%
add_markers(
x = ~lon, y = ~lat, size = ~mean_delay, color = ~q, hoverinfo = "text",
text = ~paste(merged_city_delay$name, "<br />", merged_city_delay$mean_delay, "minutes")
) %>%
layout(title = '2017 average departure delay (minutes) by city', geo = g)
p_cities
## Warning: Ignoring 102 observations
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
From the plot, we see that similar to the plot by states, cities in the Northeast, South and the West coast are more likely to have delay times at the highest (yellow) or second highest (green) quartiles, with some cities (e.g. St. Augustine in Florida) reaching average delays of more than 60 minutes. Cities with the shortest average delay times are generally in the Midwest area.
Next, we look at the flight routes with delays: we will display the routes with an average delay time of 15+, 30+, 60+, and 90+ minutes in 2017:
# group by flight routes and calculate mean departure delay
route_delay <- dat %>% filter(DEP_DEL15==1) %>%
group_by(ORIGIN_CITY_NAME, DEST_CITY_NAME) %>%
summarize(mean_delay = mean(DEP_DELAY_NEW, na.rm = TRUE))
route_delay <- cSplit(route_delay, "ORIGIN_CITY_NAME", sep=",")
route_delay <- cSplit(route_delay, "DEST_CITY_NAME", sep=",")
route_delay <- route_delay %>% mutate(name1 = ORIGIN_CITY_NAME_1, name2 = DEST_CITY_NAME_1)
# add the coordination of cities
coordinate <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/2014_us_cities.csv')
route_delay <- route_delay %>% mutate(name1 = trimws(as.character(name1)), name2 = trimws(as.character(name2)))
coordinate <- coordinate %>% mutate(name = trimws(as.character(name)))
merged_1 <- left_join(route_delay,coordinate, by = c("name1" = "name")) %>%
rename(lat1 = lat, lon1 = lon, pop1 = pop) %>%
select(mean_delay, name1, name2, pop1, lat1, lon1)
merged_2 <- left_join(route_delay,coordinate, by = c("name2" = "name")) %>%
rename(lat2 = lat, lon2 = lon, pop2 = pop) %>%
select(mean_delay, name1, name2, pop2, lat2, lon2)
merged_route_delay <- left_join(merged_1, merged_2, by = c("name1", "name2")) %>%
rename(mean_delay = mean_delay.x) %>%
select(mean_delay, name1, name2, pop1, lat1, lon1, pop2, lat2, lon2)
merged_route_delay <- merged_route_delay %>% # get the mean population for each city
group_by(name1, name2) %>%
summarize(mean_delay = mean(mean_delay, na.rm = TRUE),
pop1 = mean(pop1, na.rm = TRUE), pop2 = mean(pop2, na.rm = TRUE),
lat1 = mean(lat1, na.rm = TRUE), lon1 = mean(lon1, na.rm = TRUE),
lat2 = mean(lat2, na.rm = TRUE), lon2 = mean(lon2, na.rm = TRUE))
# map projection
# restrict to >15, >30, >60, >90 minutes of delay
delay1 <-merged_route_delay %>%
filter(mean_delay >= 60)
delay2 <-merged_route_delay %>%
filter(mean_delay >= 120)
delay3 <-merged_route_delay %>%
filter(mean_delay >= 180)
delay4 <-merged_route_delay %>%
filter(mean_delay >= 240) %>% filter(!is.na(pop1)) %>% filter(!is.na(pop2))
geo <- list(
scope = 'north america',
projection = list(type = 'azimuthal equal area'),
showland = TRUE,
landcolor = toRGB("gray95"),
countrycolor = toRGB("gray80")
)
p1 <- plot_geo(locationmode = 'USA-states', color = I("red")) %>%
add_markers(
data = delay1, x = ~lon1, y = ~lat1, text = ~name1,
size = ~pop1, hoverinfo = "text", alpha = 0.5
) %>%
add_markers(
data = delay1, x = ~lon2, y = ~lat2, text = ~name2,
size = ~pop2, hoverinfo = "text", alpha = 0.5
) %>%
add_segments(
x = ~lon1, xend = ~lon2,
y = ~lat1, yend = ~lat2,
alpha = 0.3, size = I(1), hoverinfo = "none"
) %>%
layout(
title = '2017 flight routes with >60 min delay',
geo = geo, showlegend = FALSE)
p2 <- plot_geo(locationmode = 'USA-states', color = I("red")) %>%
add_markers(
data = delay1, x = ~lon1, y = ~lat1, text = ~name1,
size = ~pop1, hoverinfo = "text", alpha = 0.5
) %>%
add_markers(
data = delay2, x = ~lon2, y = ~lat2, text = ~name2,
size = ~pop2, hoverinfo = "text", alpha = 0.5
) %>%
add_segments(
x = ~lon1, xend = ~lon2,
y = ~lat1, yend = ~lat2,
alpha = 0.3, size = I(1), hoverinfo = "none"
) %>%
layout(
title = '2017 flight routes with >120 min delay',
geo = geo, showlegend = FALSE)
p3 <- plot_geo(locationmode = 'USA-states', color = I("red")) %>%
add_markers(
data = delay3, x = ~lon1, y = ~lat1, text = ~name1,
size = ~pop1, hoverinfo = "text", alpha = 0.5
) %>%
add_markers(
data = delay3, x = ~lon2, y = ~lat2, text = ~name2,
size = ~pop2, hoverinfo = "text", alpha = 0.5
) %>%
add_segments(
x = ~lon1, xend = ~lon2,
y = ~lat1, yend = ~lat2,
alpha = 0.3, size = I(1), hoverinfo = "none"
) %>%
layout(
title = '2017 flight routes with >180 min delay',
geo = geo, showlegend = FALSE )
p4 <- plot_geo(locationmode = 'USA-states', color = I("red")) %>%
add_markers(
data = delay3, x = ~lon1, y = ~lat1, text = ~name1,
size = ~pop1, hoverinfo = "text", alpha = 0.5
) %>%
add_markers(
data = delay4, x = ~lon2, y = ~lat2, text = ~name2,
size = ~pop2, hoverinfo = "text", alpha = 0.5
) %>%
add_segments(
x = ~lon1, xend = ~lon2,
y = ~lat1, yend = ~lat2,
alpha = 0.3, size = I(1), hoverinfo = "none"
) %>%
layout(
title = '2017 flight routes with >240 min delay',
geo = geo, showlegend = FALSE )
p <- subplot(p1, p2, p3, p4, nrows = 2) %>%
layout(title = "2017 flight routes with different delay times",
xaxis = list(domain=list(x=c(0,0.5),y=c(0,0.5))),
scene = list(domain=list(x=c(0.5,1),y=c(0,0.5))),
xaxis2 = list(domain=list(x=c(0.5,1),y=c(0.5,1))),
annotations = list(
list(x = 0.2 , y = 1, text = ">60 mins", showarrow = F, xref='paper', yref='paper'),
list(x = 0.8 , y = 1, text = ">120 mins", showarrow = F, xref='paper', yref='paper'),
list(x = 0.2 , y = 0.5, text = ">180 mins", showarrow = F, xref='paper', yref='paper'),
list(x = 0.8 , y = 0.5, text = ">240 mins", showarrow = F, xref='paper', yref='paper'))
)
## Warning: Ignoring 475 observations
## Warning: Ignoring 440 observations
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: Ignoring 475 observations
## Warning: Ignoring 40 observations
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: Ignoring 11 observations
## Warning: Ignoring 10 observations
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: Ignoring 11 observations
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
p
We can see from the plot that as the threshold for delays increases, the number of routes with the corresponding delay time decreases. There have been many routes with average delay times of 15+ minutes in 2017, but only very few of them had an average delay of more than 60 or 90 minutes (e.g. the route between New York and San Antonio).
After gaining an overview of the delay patterns by various factors, we wish to make predictions of delay times. We will be using linear regression models to predict mean delay times, and logistic regression models to predict the probablity of delay (>= 15 minutes).
In this part, we are using linear models to predict mean delay times. Our predictors of interest are carrier, days of week and time of day, and we will be looking at them separately, both in the crude model and in the model incorporating these factors: (1) carrier, (2) month, (3) day of week, (4) distance of flight route, (5) time of day, and (6) region of departure.
# crude, predictor: carrier
df <- dat %>%
filter(DEP_DELAY_NEW>=15)
delay.lm = lm(DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER, data = df)
lsmeans(delay.lm, ~ OP_UNIQUE_CARRIER)
## OP_UNIQUE_CARRIER lsmean SE df lower.CL upper.CL
## AA 64.44212 0.2136694 1013833 64.02333 64.86090
## AS 53.62569 0.5244802 1013833 52.59773 54.65365
## B6 72.76209 0.2904053 1013833 72.19291 73.33128
## DL 69.02952 0.2200936 1013833 68.59814 69.46089
## EV 85.72511 0.3212330 1013833 85.09550 86.35472
## F9 70.47161 0.5549647 1013833 69.38390 71.55933
## HA 48.57617 0.9886941 1013833 46.63837 50.51398
## NK 72.09632 0.4730808 1013833 71.16909 73.02354
## OO 85.21249 0.2381762 1013833 84.74567 85.67930
## UA 71.50244 0.2587640 1013833 70.99528 72.00961
## VX 60.78735 0.6073894 1013833 59.59689 61.97781
## WN 47.68216 0.1516620 1013833 47.38491 47.97942
##
## Confidence level used: 0.95
# adjusted, predictor: carrier
df <- df %>%
mutate(hour_cat = cut(DEP_TIME, breaks=c(-Inf, 600, 1200, 1800, Inf), labels=c("0 to 6","6 to 12","12 to 18", "18 to 24"))) %>%
mutate(NORTHEAST = ifelse(ORIGIN_STATE_ABR %in% c("CT","ME", "MA", "NH","RI","VT","NJ","NY","PA"), "yes", "no")) %>%
mutate(MIDWEST = ifelse(ORIGIN_STATE_ABR %in% c("IL","IN","MI","OH","WI","IA","KS","MN","MO","NE","ND","SD"), "yes", "no")) %>%
mutate(SOUTH = ifelse(ORIGIN_STATE_ABR %in% c("DE","FL","GA","MD","NC","SC","VA","DC","WV","AL","KY","MS","TN","AR","LA","OK","TX"), "yes", "no")) %>%
mutate(WEST = ifelse(ORIGIN_STATE_ABR %in% c("AZ","CO","ID","MT","NV","NM","UT","WY","AK","CA","HI","OR","WA"), "yes", "no")) %>%
mutate(SPRING = ifelse(MONTH %in% c(3,4,5),"yes","no")) %>%
mutate(SUMMER = ifelse(MONTH %in% c(6,7,8),"yes","no")) %>%
mutate(FALL = ifelse(MONTH %in% c(9,10,11),"yes","no")) %>%
mutate(WINTER = ifelse(MONTH %in% c(12,1,2),"yes","no"))
delay2.lm = lm(DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) + DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST, data = df)
summary(delay2.lm)
##
## Call:
## lm(formula = DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) +
## DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -148.59 -37.48 -19.50 11.24 2700.80
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.331e+02 1.162e+00 114.536 < 2e-16 ***
## OP_UNIQUE_CARRIERAS -7.145e+00 5.736e-01 -12.456 < 2e-16 ***
## OP_UNIQUE_CARRIERB6 2.683e+00 3.657e-01 7.336 2.20e-13 ***
## OP_UNIQUE_CARRIERDL 4.147e+00 3.039e-01 13.646 < 2e-16 ***
## OP_UNIQUE_CARRIEREV 2.055e+01 3.915e-01 52.492 < 2e-16 ***
## OP_UNIQUE_CARRIERF9 4.009e+00 5.898e-01 6.797 1.07e-11 ***
## OP_UNIQUE_CARRIERHA -8.929e+00 1.011e+00 -8.835 < 2e-16 ***
## OP_UNIQUE_CARRIERNK 4.465e+00 5.134e-01 8.697 < 2e-16 ***
## OP_UNIQUE_CARRIEROO 2.426e+01 3.412e-01 71.096 < 2e-16 ***
## OP_UNIQUE_CARRIERUA 7.436e+00 3.349e-01 22.202 < 2e-16 ***
## OP_UNIQUE_CARRIERVX -4.653e-02 6.451e-01 -0.072 0.942504
## OP_UNIQUE_CARRIERWN -1.508e+01 2.678e-01 -56.313 < 2e-16 ***
## MONTH -3.640e-01 2.414e-02 -15.076 < 2e-16 ***
## factor(DAY_OF_WEEK)2 -4.149e+00 2.966e-01 -13.987 < 2e-16 ***
## factor(DAY_OF_WEEK)3 -2.696e+00 2.929e-01 -9.206 < 2e-16 ***
## factor(DAY_OF_WEEK)4 -3.668e+00 2.811e-01 -13.049 < 2e-16 ***
## factor(DAY_OF_WEEK)5 -1.016e+00 2.768e-01 -3.670 0.000243 ***
## factor(DAY_OF_WEEK)6 -1.031e+00 3.124e-01 -3.301 0.000965 ***
## factor(DAY_OF_WEEK)7 -1.402e+00 2.918e-01 -4.803 1.56e-06 ***
## DISTANCE -8.189e-04 1.386e-04 -5.907 3.48e-09 ***
## hour_cat6 to 12 -7.637e+01 5.505e-01 -138.723 < 2e-16 ***
## hour_cat12 to 18 -7.571e+01 5.369e-01 -141.010 < 2e-16 ***
## hour_cat18 to 24 -6.301e+01 5.377e-01 -117.200 < 2e-16 ***
## NORTHEASTyes 1.071e+01 1.024e+00 10.455 < 2e-16 ***
## MIDWESTyes 5.719e+00 1.030e+00 5.551 2.83e-08 ***
## SOUTHyes 6.234e+00 1.015e+00 6.141 8.19e-10 ***
## WESTyes 1.373e+00 1.018e+00 1.349 0.177480
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 79.81 on 1013818 degrees of freedom
## Multiple R-squared: 0.05025, Adjusted R-squared: 0.05023
## F-statistic: 2063 on 26 and 1013818 DF, p-value: < 2.2e-16
lsmeans(delay2.lm, ~ OP_UNIQUE_CARRIER)
## OP_UNIQUE_CARRIER lsmean SE df lower.CL upper.CL
## AA 86.30274 1.054612 1013818 84.23574 88.36974
## AS 79.15797 1.153785 1013818 76.89659 81.41936
## B6 88.98538 1.103015 1013818 86.82350 91.14725
## DL 90.44956 1.050768 1013818 88.39009 92.50903
## EV 106.85069 1.077871 1013818 104.73810 108.96328
## F9 90.31143 1.163980 1013818 88.03007 92.59280
## HA 77.37354 1.424548 1013818 74.58147 80.16561
## NK 90.76775 1.133621 1013818 88.54589 92.98961
## OO 110.55841 1.056248 1013818 108.48820 112.62862
## UA 93.73895 1.062516 1013818 91.65646 95.82145
## VX 86.25621 1.192479 1013818 83.91899 88.59343
## WN 71.22161 1.041578 1013818 69.18016 73.26307
##
## Results are averaged over the levels of: DAY_OF_WEEK, hour_cat, NORTHEAST, MIDWEST, SOUTH, WEST
## Confidence level used: 0.95
We summarized the above results into the table below:
From the adjusted predicted mean delay times for each carrier, we see that when assuming all other factors are on average, Hawaiian Airline in general has the shortest predicted delay time, followed by Alaska Airline, American Airline and Delta Airline; and in general JetBlue Airline has the longest predicted delay times.
# crude, predictor: day of week
delay_day.lm = lm(DEP_DELAY_NEW ~ factor(DAY_OF_WEEK), data = df)
lsmeans(delay_day.lm, ~ DAY_OF_WEEK)
## DAY_OF_WEEK lsmean SE df lower.CL upper.CL
## 1 68.09341 0.2035731 1013838 67.69442 68.49241
## 2 63.18018 0.2259230 1013838 62.73738 63.62298
## 3 64.22169 0.2206569 1013838 63.78921 64.65417
## 4 63.26493 0.2038631 1013838 62.86537 63.66450
## 5 66.05751 0.1975888 1013838 65.67024 66.44478
## 6 64.94108 0.2464242 1013838 64.45809 65.42406
## 7 66.71645 0.2191147 1013838 66.28700 67.14591
##
## Confidence level used: 0.95
# adjusted, predictor: day of week
lsmeans(delay2.lm, ~ DAY_OF_WEEK )
## DAY_OF_WEEK lsmean SE df lower.CL upper.CL
## 1 91.32565 1.054609 1013818 89.25865 93.39265
## 2 87.17708 1.059718 1013818 85.10007 89.25409
## 3 88.62944 1.058722 1013818 86.55438 90.70450
## 4 87.65786 1.054795 1013818 85.59050 89.72522
## 5 90.30979 1.053053 1013818 88.24584 92.37374
## 6 90.29442 1.067013 1013818 88.20311 92.38573
## 7 89.92408 1.058917 1013818 87.84864 91.99952
##
## Results are averaged over the levels of: OP_UNIQUE_CARRIER, hour_cat, NORTHEAST, MIDWEST, SOUTH, WEST
## Confidence level used: 0.95
We summarized the above results into the table below:
From the adjusted predicted mean delay times by each day of week, we that that when assuming all other factors are on average, going on a flight on Tuesday, Wednesday, or Saturday would generally have shorter delay times, while leaving on Friday would probably lead to longer delay.
# crude, predictor: time of day
delay_day.lm = lm(DEP_DELAY_NEW ~ factor(hour_cat), data = df)
lsmeans(delay_day.lm, ~ hour_cat)
## hour_cat lsmean SE df lower.CL upper.CL
## 0 to 6 134.31299 0.5217897 1013841 133.29030 135.33568
## 6 to 12 59.61137 0.1806714 1013841 59.25726 59.96548
## 12 to 18 58.66513 0.1283653 1013841 58.41354 58.91672
## 18 to 24 70.69035 0.1295534 1013841 70.43643 70.94427
##
## Confidence level used: 0.95
# adjusted, predictor: time of day
lsmeans(delay2.lm, ~ hour_cat )
## hour_cat lsmean SE df lower.CL upper.CL
## 0 to 6 143.10500 1.174924 1013818 140.80219 145.40781
## 6 to 12 66.73209 1.035838 1013818 64.70188 68.76229
## 12 to 18 67.39661 1.032882 1013818 65.37220 69.42103
## 18 to 24 80.09105 1.027952 1013818 78.07630 82.10580
##
## Results are averaged over the levels of: OP_UNIQUE_CARRIER, DAY_OF_WEEK, NORTHEAST, MIDWEST, SOUTH, WEST
## Confidence level used: 0.95
We summarized the above results into the table below:
From the adjusted model, we see that when assuming all other factors are on average, going on a flight at in the morning (6:00 to 12:00) would generally have shorter delay times, while leaving at night (18:00 to 24:00) would likely result in longer delay.
# stratification
# predictor: carrier
# stratified by day of week
lsmeans(delay2.lm, ~ OP_UNIQUE_CARRIER*DAY_OF_WEEK )
## OP_UNIQUE_CARRIER DAY_OF_WEEK lsmean SE df lower.CL
## AA 1 88.29720 1.069390 1013818 86.20123
## AS 1 81.15243 1.167325 1013818 78.86452
## B6 1 90.97984 1.117285 1013818 88.78999
## DL 1 92.44402 1.064821 1013818 90.35701
## EV 1 108.84515 1.091626 1013818 106.70560
## F9 1 92.30589 1.177318 1013818 89.99839
## HA 1 79.36800 1.436411 1013818 76.55268
## NK 1 92.76221 1.147410 1013818 90.51332
## OO 1 112.55287 1.070404 1013818 110.45492
## UA 1 95.73341 1.076705 1013818 93.62311
## VX 1 88.25067 1.205014 1013818 85.88888
## WN 1 73.21607 1.056875 1013818 71.14463
## AA 2 84.14863 1.074666 1013818 82.04232
## AS 2 77.00386 1.171917 1013818 74.70695
## B6 2 86.83127 1.121694 1013818 84.63278
## DL 2 88.29545 1.070569 1013818 86.19717
## EV 2 104.69658 1.096897 1013818 102.54670
## F9 2 88.15732 1.181370 1013818 85.84188
## HA 2 75.21943 1.439157 1013818 72.39873
## NK 2 88.61364 1.152254 1013818 86.35526
## OO 2 108.40430 1.075574 1013818 106.29621
## UA 2 91.58484 1.082112 1013818 89.46394
## VX 2 84.10210 1.210309 1013818 81.72994
## WN 2 69.06750 1.061475 1013818 66.98705
## AA 3 85.60099 1.073324 1013818 83.49732
## AS 3 78.45623 1.170824 1013818 76.16145
## B6 3 88.28363 1.121069 1013818 86.08637
## DL 3 89.74781 1.070117 1013818 87.65042
## EV 3 106.14895 1.096250 1013818 104.00033
## F9 3 89.60969 1.180781 1013818 87.29540
## HA 3 76.67179 1.438230 1013818 73.85291
## NK 3 90.06600 1.151269 1013818 87.80955
## OO 3 109.85667 1.074980 1013818 107.74974
## UA 3 93.03721 1.080647 1013818 90.91917
## VX 3 85.55446 1.209239 1013818 83.18440
## WN 3 70.51987 1.060196 1013818 68.44192
## AA 4 84.62942 1.069538 1013818 82.53316
## AS 4 77.48465 1.166991 1013818 75.19739
## B6 4 87.31205 1.117386 1013818 85.12201
## DL 4 88.77624 1.065349 1013818 86.68819
## EV 4 105.17737 1.092415 1013818 103.03627
## F9 4 88.63811 1.178259 1013818 86.32876
## HA 4 75.70022 1.436200 1013818 72.88531
## NK 4 89.09442 1.147609 1013818 86.84515
## OO 4 108.88509 1.071016 1013818 106.78593
## UA 4 92.06563 1.076844 1013818 89.95505
## VX 4 84.58289 1.205170 1013818 82.22079
## WN 4 69.54829 1.055951 1013818 67.47866
## AA 5 87.28134 1.067753 1013818 85.18858
## AS 5 80.13658 1.165795 1013818 77.85166
## B6 5 89.96398 1.115595 1013818 87.77745
## DL 5 91.42816 1.063559 1013818 89.34362
## EV 5 107.82930 1.090716 1013818 105.69153
## F9 5 91.29004 1.177092 1013818 88.98298
## HA 5 78.35214 1.433452 1013818 75.54263
## NK 5 91.74635 1.146215 1013818 89.49981
## OO 5 111.53701 1.069719 1013818 109.44040
## UA 5 94.71755 1.075473 1013818 92.60966
## VX 5 87.23481 1.203664 1013818 84.87567
## WN 5 72.20022 1.054648 1013818 70.13314
## AA 6 87.26597 1.081337 1013818 85.14659
## AS 6 80.12120 1.178615 1013818 77.81116
## B6 6 89.94860 1.128415 1013818 87.73695
## DL 6 91.41279 1.078444 1013818 89.29908
## EV 6 107.81392 1.105340 1013818 105.64749
## F9 6 91.27466 1.187159 1013818 88.94787
## HA 6 78.33677 1.443095 1013818 75.50835
## NK 6 91.73098 1.157420 1013818 89.46247
## OO 6 111.52164 1.083596 1013818 109.39783
## UA 6 94.70218 1.090356 1013818 92.56512
## VX 6 87.21944 1.217311 1013818 84.83355
## WN 6 72.18484 1.069079 1013818 70.08948
## AA 7 86.89563 1.073037 1013818 84.79252
## AS 7 79.75087 1.171064 1013818 77.45562
## B6 7 89.57827 1.120954 1013818 87.38124
## DL 7 91.04245 1.069725 1013818 88.94583
## EV 7 107.44359 1.095915 1013818 105.29563
## F9 7 90.90433 1.180931 1013818 88.58974
## HA 7 77.96643 1.439523 1013818 75.14502
## NK 7 91.36064 1.151247 1013818 89.10423
## OO 7 111.15130 1.075013 1013818 109.04432
## UA 7 94.33185 1.081300 1013818 92.21253
## VX 7 86.84910 1.208960 1013818 84.47958
## WN 7 71.81451 1.061135 1013818 69.73472
## upper.CL
## 90.39317
## 83.44035
## 93.16968
## 94.53103
## 110.98470
## 94.61340
## 82.18332
## 95.01109
## 114.65083
## 97.84372
## 90.61246
## 75.28751
## 86.25494
## 79.30078
## 89.02975
## 90.39373
## 106.84646
## 90.47277
## 78.04013
## 90.87202
## 110.51239
## 93.70575
## 86.47427
## 71.14796
## 87.70467
## 80.75100
## 90.48089
## 91.84521
## 108.29756
## 91.92398
## 79.49068
## 92.32245
## 111.96359
## 95.15524
## 87.92453
## 72.59782
## 86.72567
## 79.77191
## 89.50209
## 90.86428
## 107.31846
## 90.94746
## 78.51512
## 91.34370
## 110.98424
## 94.17620
## 86.94498
## 71.61792
## 89.37410
## 82.42150
## 92.15051
## 93.51270
## 109.96706
## 93.59710
## 81.16166
## 93.99289
## 113.63363
## 96.82545
## 89.59396
## 74.26729
## 89.38535
## 82.43125
## 92.16026
## 93.52650
## 109.98035
## 93.60146
## 81.16519
## 93.99948
## 113.64545
## 96.83924
## 89.60533
## 74.28020
## 88.99875
## 82.04611
## 91.77530
## 93.13908
## 109.59154
## 93.21891
## 80.78785
## 93.61704
## 113.25829
## 96.45116
## 89.21862
## 73.89430
##
## Results are averaged over the levels of: hour_cat, NORTHEAST, MIDWEST, SOUTH, WEST
## Confidence level used: 0.95
We summarized the above results (stratified by day of week) into the table below: We see that similar to our previous findings, on average the delay times on
Tuesdays are the shortest, and on Tuesday the carrier with the shortest predicted delay is Hawaiian Airline. Likewise, the predicted delay times for Fridays are the highest, and on Friday the carrier with the longest predicted delay is JetBlue. So probably not a good idea to leave on Friday on a JetBlue flight!
# stratification
# predictor: carrier
# stratified by time of day
lsmeans(delay2.lm, ~ OP_UNIQUE_CARRIER*hour_cat )
## OP_UNIQUE_CARRIER hour_cat lsmean SE df lower.CL upper.CL
## AA 0 to 6 140.07656 1.188703 1013818 137.74674 142.40638
## AS 0 to 6 132.93179 1.278937 1013818 130.42512 135.43846
## B6 0 to 6 142.75919 1.224120 1013818 140.35996 145.15842
## DL 0 to 6 144.22338 1.184001 1013818 141.90278 146.54398
## EV 0 to 6 160.62451 1.209686 1013818 158.25356 162.99545
## F9 0 to 6 144.08525 1.279970 1013818 141.57655 146.59395
## HA 0 to 6 131.14736 1.530012 1013818 128.14858 134.14613
## NK 0 to 6 144.54156 1.253775 1013818 142.08420 146.99892
## OO 0 to 6 164.33223 1.192376 1013818 161.99521 166.66924
## UA 0 to 6 147.51277 1.195509 1013818 145.16961 149.85593
## VX 0 to 6 140.03003 1.316445 1013818 137.44984 142.61021
## WN 0 to 6 124.99543 1.179585 1013818 122.68348 127.30738
## AA 6 to 12 63.70364 1.050290 1013818 61.64511 65.76217
## AS 6 to 12 56.55887 1.148643 1013818 54.30757 58.81018
## B6 6 to 12 66.38627 1.102992 1013818 64.22445 68.54810
## DL 6 to 12 67.85046 1.047083 1013818 65.79821 69.90271
## EV 6 to 12 84.25159 1.073163 1013818 82.14823 86.35495
## F9 6 to 12 67.71233 1.162382 1013818 65.43410 69.99056
## HA 6 to 12 54.77444 1.420556 1013818 51.99020 57.55868
## NK 6 to 12 68.16865 1.131908 1013818 65.95014 70.38715
## OO 6 to 12 87.95931 1.050498 1013818 85.90037 90.01825
## UA 6 to 12 71.13985 1.058600 1013818 69.06503 73.21467
## VX 6 to 12 63.65711 1.187310 1013818 61.33002 65.98420
## WN 6 to 12 48.62251 1.038192 1013818 46.58769 50.65734
## AA 12 to 18 64.36817 1.047627 1013818 62.31485 66.42148
## AS 12 to 18 57.22340 1.147739 1013818 54.97387 59.47293
## B6 12 to 18 67.05080 1.099430 1013818 64.89596 69.20565
## DL 12 to 18 68.51499 1.044078 1013818 66.46863 70.56135
## EV 12 to 18 84.91612 1.070504 1013818 82.81797 87.01427
## F9 12 to 18 68.37686 1.160889 1013818 66.10156 70.65216
## HA 12 to 18 55.43897 1.416716 1013818 52.66225 58.21568
## NK 12 to 18 68.83317 1.130041 1013818 66.61833 71.04802
## OO 12 to 18 88.62384 1.047954 1013818 86.56988 90.67779
## UA 12 to 18 71.80438 1.055664 1013818 69.73531 73.87345
## VX 12 to 18 64.32164 1.185139 1013818 61.99880 66.64447
## WN 12 to 18 49.28704 1.033076 1013818 47.26225 51.31184
## AA 18 to 24 77.06260 1.042952 1013818 75.01845 79.10675
## AS 18 to 24 69.91783 1.142406 1013818 67.67876 72.15691
## B6 18 to 24 79.74523 1.093314 1013818 77.60238 81.88809
## DL 18 to 24 81.20942 1.039563 1013818 79.17191 83.24693
## EV 18 to 24 97.61055 1.067146 1013818 95.51898 99.70212
## F9 18 to 24 81.07129 1.155358 1013818 78.80683 83.33576
## HA 18 to 24 68.13340 1.415601 1013818 65.35887 70.90793
## NK 18 to 24 81.52761 1.123743 1013818 79.32511 83.73010
## OO 18 to 24 101.31827 1.044845 1013818 99.27041 103.36613
## UA 18 to 24 84.49881 1.050731 1013818 82.43941 86.55821
## VX 18 to 24 77.01607 1.180387 1013818 74.70255 79.32959
## WN 18 to 24 61.98147 1.027424 1013818 59.96776 63.99519
##
## Results are averaged over the levels of: DAY_OF_WEEK, NORTHEAST, MIDWEST, SOUTH, WEST
## Confidence level used: 0.95
We summarized the above results (stratified by time of day) into the table below: We see that similar to our previous findings, on average the delay times when leaving between
6:00 to 12:00 in the morning are the shortest, and at that time period the carrier with the shortest predicted delay is still Hawaiian Airline. Likewise, the predicted delay times for 18:00 to 24:00 are the highest, and at that time period the carrier with the longest predicted delay is still JetBlue.
Since weather could impact delays, and weather patterns are often related to seasons, we also wish stratify by season to see if there are any differences:
# stratification
# predictor: carrier
# stratified by season
# Spring
Spring <- df %>%
filter (SPRING=="yes")
delay_spring.lm = lm(DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) + DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST, data = Spring)
summary(delay_spring.lm)
##
## Call:
## lm(formula = DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) +
## DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST,
## data = Spring)
##
## Residuals:
## Min 1Q Median 3Q Max
## -159.73 -38.78 -19.33 12.25 1757.68
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.302e+02 2.427e+00 53.657 < 2e-16 ***
## OP_UNIQUE_CARRIERAS -7.917e+00 1.141e+00 -6.938 3.98e-12 ***
## OP_UNIQUE_CARRIERB6 5.910e+00 7.099e-01 8.325 < 2e-16 ***
## OP_UNIQUE_CARRIERDL 1.217e+01 5.778e-01 21.055 < 2e-16 ***
## OP_UNIQUE_CARRIEREV 2.379e+01 7.331e-01 32.446 < 2e-16 ***
## OP_UNIQUE_CARRIERF9 4.480e+00 1.257e+00 3.565 0.000363 ***
## OP_UNIQUE_CARRIERHA -9.911e+00 2.086e+00 -4.750 2.03e-06 ***
## OP_UNIQUE_CARRIERNK 5.923e+00 9.751e-01 6.074 1.25e-09 ***
## OP_UNIQUE_CARRIEROO 2.271e+01 6.876e-01 33.033 < 2e-16 ***
## OP_UNIQUE_CARRIERUA 7.155e+00 6.660e-01 10.744 < 2e-16 ***
## OP_UNIQUE_CARRIERVX 3.306e+00 1.178e+00 2.806 0.005018 **
## OP_UNIQUE_CARRIERWN -1.577e+01 5.327e-01 -29.600 < 2e-16 ***
## MONTH 9.422e-01 1.916e-01 4.916 8.83e-07 ***
## factor(DAY_OF_WEEK)2 -4.050e+00 5.913e-01 -6.849 7.47e-12 ***
## factor(DAY_OF_WEEK)3 -2.461e+00 5.648e-01 -4.357 1.32e-05 ***
## factor(DAY_OF_WEEK)4 6.977e-01 5.524e-01 1.263 0.206626
## factor(DAY_OF_WEEK)5 -6.016e-01 5.485e-01 -1.097 0.272667
## factor(DAY_OF_WEEK)6 -3.240e+00 6.328e-01 -5.120 3.06e-07 ***
## factor(DAY_OF_WEEK)7 -2.411e+00 5.810e-01 -4.149 3.34e-05 ***
## DISTANCE -1.835e-03 2.706e-04 -6.782 1.19e-11 ***
## hour_cat6 to 12 -8.800e+01 1.090e+00 -80.742 < 2e-16 ***
## hour_cat12 to 18 -8.536e+01 1.062e+00 -80.408 < 2e-16 ***
## hour_cat18 to 24 -7.232e+01 1.063e+00 -68.063 < 2e-16 ***
## NORTHEASTyes 1.779e+01 2.060e+00 8.636 < 2e-16 ***
## MIDWESTyes 1.277e+01 2.077e+00 6.150 7.78e-10 ***
## SOUTHyes 1.387e+01 2.045e+00 6.781 1.20e-11 ***
## WESTyes 7.427e+00 2.052e+00 3.619 0.000296 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 80.89 on 269489 degrees of freedom
## Multiple R-squared: 0.05893, Adjusted R-squared: 0.05884
## F-statistic: 649.1 on 26 and 269489 DF, p-value: < 2.2e-16
lsmeans(delay_spring.lm, ~ OP_UNIQUE_CARRIER )
## OP_UNIQUE_CARRIER lsmean SE df lower.CL upper.CL
## AA 95.20257 2.126453 269489 91.03478 99.37036
## AS 87.28583 2.320574 269489 82.73757 91.83409
## B6 101.11268 2.217500 269489 96.76644 105.45892
## DL 107.36821 2.109018 269489 103.23459 111.50183
## EV 118.98930 2.157419 269489 114.76082 123.21778
## F9 99.68291 2.375921 269489 95.02617 104.33965
## HA 85.29198 2.906236 269489 79.59583 90.98812
## NK 101.12551 2.261746 269489 96.69255 105.55848
## OO 117.91495 2.131479 269489 113.73731 122.09259
## UA 102.35757 2.144668 269489 98.15408 106.56107
## VX 98.50869 2.342861 269489 93.91675 103.10064
## WN 79.43340 2.099894 269489 75.31766 83.54913
##
## Results are averaged over the levels of: DAY_OF_WEEK, hour_cat, NORTHEAST, MIDWEST, SOUTH, WEST
## Confidence level used: 0.95
# Summer
Summer <- df %>%
filter (SUMMER=="yes")
delay_summer.lm = lm(DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) + DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST, data = Summer)
summary(delay_summer.lm)
##
## Call:
## lm(formula = DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) +
## DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST,
## data = Summer)
##
## Residuals:
## Min 1Q Median 3Q Max
## -149.61 -37.26 -18.91 12.31 1845.96
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.412e+02 2.227e+00 63.407 < 2e-16 ***
## OP_UNIQUE_CARRIERAS -1.128e+01 1.069e+00 -10.545 < 2e-16 ***
## OP_UNIQUE_CARRIERB6 3.339e+00 6.251e-01 5.342 9.21e-08 ***
## OP_UNIQUE_CARRIERDL -4.640e+00 5.313e-01 -8.734 < 2e-16 ***
## OP_UNIQUE_CARRIEREV 1.756e+01 7.167e-01 24.508 < 2e-16 ***
## OP_UNIQUE_CARRIERF9 3.553e+00 1.030e+00 3.449 0.000563 ***
## OP_UNIQUE_CARRIERHA -8.507e+00 2.271e+00 -3.746 0.000180 ***
## OP_UNIQUE_CARRIERNK 4.477e-01 8.934e-01 0.501 0.616299
## OP_UNIQUE_CARRIEROO 2.330e+01 6.012e-01 38.756 < 2e-16 ***
## OP_UNIQUE_CARRIERUA 8.181e+00 5.775e-01 14.167 < 2e-16 ***
## OP_UNIQUE_CARRIERVX -3.956e+00 1.206e+00 -3.279 0.001041 **
## OP_UNIQUE_CARRIERWN -1.553e+01 4.532e-01 -34.256 < 2e-16 ***
## MONTH -1.561e+00 1.721e-01 -9.072 < 2e-16 ***
## factor(DAY_OF_WEEK)2 -6.234e+00 5.203e-01 -11.983 < 2e-16 ***
## factor(DAY_OF_WEEK)3 -1.968e+00 5.180e-01 -3.798 0.000146 ***
## factor(DAY_OF_WEEK)4 -7.407e+00 4.910e-01 -15.085 < 2e-16 ***
## factor(DAY_OF_WEEK)5 -1.730e+00 4.876e-01 -3.547 0.000389 ***
## factor(DAY_OF_WEEK)6 -5.480e+00 5.448e-01 -10.058 < 2e-16 ***
## factor(DAY_OF_WEEK)7 -7.775e+00 5.276e-01 -14.737 < 2e-16 ***
## DISTANCE -2.710e-04 2.462e-04 -1.101 0.271042
## hour_cat6 to 12 -7.465e+01 8.799e-01 -84.845 < 2e-16 ***
## hour_cat12 to 18 -7.656e+01 8.514e-01 -89.920 < 2e-16 ***
## hour_cat18 to 24 -6.121e+01 8.488e-01 -72.118 < 2e-16 ***
## NORTHEASTyes 1.828e+01 1.690e+00 10.815 < 2e-16 ***
## MIDWESTyes 9.585e+00 1.703e+00 5.629 1.81e-08 ***
## SOUTHyes 1.115e+01 1.674e+00 6.662 2.71e-11 ***
## WESTyes 3.631e+00 1.681e+00 2.161 0.030730 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 79.05 on 322346 degrees of freedom
## Multiple R-squared: 0.05862, Adjusted R-squared: 0.05854
## F-statistic: 772 on 26 and 322346 DF, p-value: < 2.2e-16
lsmeans(delay_summer.lm, ~ OP_UNIQUE_CARRIER )
## OP_UNIQUE_CARRIER lsmean SE df lower.CL upper.CL
## AA 93.91076 1.741913 322346 90.49666 97.32486
## AS 82.63575 1.966615 322346 78.78124 86.49026
## B6 97.24982 1.838126 322346 93.64714 100.85249
## DL 89.27028 1.741303 322346 85.85737 92.68318
## EV 111.47517 1.804076 322346 107.93923 115.01111
## F9 97.46361 1.953448 322346 93.63491 101.29231
## HA 85.40358 2.809316 322346 79.89740 90.90976
## NK 94.35843 1.887368 322346 90.65924 98.05762
## OO 117.21112 1.751853 322346 113.77754 120.64471
## UA 102.09201 1.758640 322346 98.64513 105.53890
## VX 89.95459 2.046975 322346 85.94258 93.96660
## WN 78.38513 1.719244 322346 75.01546 81.75480
##
## Results are averaged over the levels of: DAY_OF_WEEK, hour_cat, NORTHEAST, MIDWEST, SOUTH, WEST
## Confidence level used: 0.95
# Fall
Fall <- df %>%
filter (FALL=="yes")
delay_fall.lm = lm(DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) + DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST, data = Fall)
summary(delay_fall.lm)
##
## Call:
## lm(formula = DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) +
## DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST,
## data = Fall)
##
## Residuals:
## Min 1Q Median 3Q Max
## -131.32 -34.73 -19.23 9.04 1724.55
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.351e+02 3.668e+00 36.837 < 2e-16 ***
## OP_UNIQUE_CARRIERAS -4.002e+00 1.255e+00 -3.188 0.00143 **
## OP_UNIQUE_CARRIERB6 1.282e+00 8.557e-01 1.498 0.13421
## OP_UNIQUE_CARRIERDL -1.960e-01 7.176e-01 -0.273 0.78474
## OP_UNIQUE_CARRIEREV 2.041e+01 9.361e-01 21.800 < 2e-16 ***
## OP_UNIQUE_CARRIERF9 6.929e+00 1.281e+00 5.411 6.29e-08 ***
## OP_UNIQUE_CARRIERHA -1.309e+01 2.243e+00 -5.839 5.27e-09 ***
## OP_UNIQUE_CARRIERNK 8.741e+00 1.208e+00 7.233 4.74e-13 ***
## OP_UNIQUE_CARRIEROO 2.527e+01 7.574e-01 33.363 < 2e-16 ***
## OP_UNIQUE_CARRIERUA 7.007e+00 7.627e-01 9.187 < 2e-16 ***
## OP_UNIQUE_CARRIERVX -8.588e-02 1.417e+00 -0.061 0.95167
## OP_UNIQUE_CARRIERWN -1.440e+01 6.159e-01 -23.390 < 2e-16 ***
## MONTH -1.111e+00 2.302e-01 -4.826 1.39e-06 ***
## factor(DAY_OF_WEEK)2 -1.873e+00 6.750e-01 -2.776 0.00551 **
## factor(DAY_OF_WEEK)3 -5.050e+00 6.682e-01 -7.558 4.10e-14 ***
## factor(DAY_OF_WEEK)4 -1.979e+00 6.326e-01 -3.129 0.00176 **
## factor(DAY_OF_WEEK)5 -1.648e+00 6.177e-01 -2.668 0.00763 **
## factor(DAY_OF_WEEK)6 -4.049e-01 7.489e-01 -0.541 0.58877
## factor(DAY_OF_WEEK)7 1.091e+00 6.318e-01 1.727 0.08419 .
## DISTANCE 5.553e-04 3.146e-04 1.765 0.07751 .
## hour_cat6 to 12 -6.337e+01 1.452e+00 -43.650 < 2e-16 ***
## hour_cat12 to 18 -6.276e+01 1.427e+00 -43.993 < 2e-16 ***
## hour_cat18 to 24 -5.290e+01 1.431e+00 -36.974 < 2e-16 ***
## NORTHEASTyes -2.641e+00 2.422e+00 -1.090 0.27552
## MIDWESTyes -4.301e+00 2.426e+00 -1.773 0.07625 .
## SOUTHyes -5.913e+00 2.397e+00 -2.467 0.01361 *
## WESTyes -5.770e+00 2.402e+00 -2.403 0.01628 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 76.51 on 180927 degrees of freedom
## Multiple R-squared: 0.04286, Adjusted R-squared: 0.04273
## F-statistic: 311.6 on 26 and 180927 DF, p-value: < 2.2e-16
lsmeans(delay_fall.lm, ~ OP_UNIQUE_CARRIER )
## OP_UNIQUE_CARRIER lsmean SE df lower.CL upper.CL
## AA 69.03222 2.482795 180927 64.16599 73.89844
## AS 65.03069 2.673251 180927 59.79118 70.27020
## B6 70.31379 2.574037 180927 65.26874 75.35885
## DL 68.83619 2.486525 180927 63.96266 73.70972
## EV 89.43838 2.545076 180927 84.45009 94.42667
## F9 75.96079 2.698625 180927 70.67155 81.25003
## HA 55.93765 3.263403 180927 49.54146 62.33385
## NK 77.77343 2.682119 180927 72.51653 83.03032
## OO 94.30103 2.478411 180927 89.44340 99.15866
## UA 76.03957 2.489662 180927 71.15989 80.91925
## VX 68.94633 2.754938 180927 63.54672 74.34595
## WN 54.62746 2.455069 180927 49.81558 59.43934
##
## Results are averaged over the levels of: DAY_OF_WEEK, hour_cat, NORTHEAST, MIDWEST, SOUTH, WEST
## Confidence level used: 0.95
# release some memory
rm(Spring)
rm(Summer)
rm(Fall)
# Winter
Winter <- df %>%
filter (WINTER=="yes")
delay_winter.lm = lm(DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) + DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST, data = Winter)
summary(delay_winter.lm)
##
## Call:
## lm(formula = DEP_DELAY_NEW ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) +
## DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST,
## data = Winter)
##
## Residuals:
## Min 1Q Median 3Q Max
## -143.82 -36.96 -19.53 10.63 2704.37
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.279e+02 2.540e+00 50.363 < 2e-16 ***
## OP_UNIQUE_CARRIERAS -3.481e+00 1.145e+00 -3.041 0.002362 **
## OP_UNIQUE_CARRIERB6 -1.058e+00 7.911e-01 -1.337 0.181189
## OP_UNIQUE_CARRIERDL 8.237e+00 6.526e-01 12.622 < 2e-16 ***
## OP_UNIQUE_CARRIEREV 2.209e+01 8.053e-01 27.430 < 2e-16 ***
## OP_UNIQUE_CARRIERF9 3.582e+00 1.194e+00 3.001 0.002689 **
## OP_UNIQUE_CARRIERHA -6.199e+00 1.703e+00 -3.640 0.000273 ***
## OP_UNIQUE_CARRIERNK 5.400e+00 1.106e+00 4.880 1.06e-06 ***
## OP_UNIQUE_CARRIEROO 2.710e+01 7.120e-01 38.068 < 2e-16 ***
## OP_UNIQUE_CARRIERUA 8.104e+00 7.113e-01 11.393 < 2e-16 ***
## OP_UNIQUE_CARRIERVX 2.535e+00 1.412e+00 1.795 0.072614 .
## OP_UNIQUE_CARRIERWN -1.400e+01 5.792e-01 -24.166 < 2e-16 ***
## MONTH -1.288e-01 3.333e-02 -3.863 0.000112 ***
## factor(DAY_OF_WEEK)2 -3.665e+00 6.137e-01 -5.972 2.35e-09 ***
## factor(DAY_OF_WEEK)3 -3.502e+00 6.239e-01 -5.613 1.99e-08 ***
## factor(DAY_OF_WEEK)4 -5.977e+00 6.007e-01 -9.951 < 2e-16 ***
## factor(DAY_OF_WEEK)5 -8.684e-01 5.838e-01 -1.488 0.136859
## factor(DAY_OF_WEEK)6 4.928e+00 6.265e-01 7.866 3.69e-15 ***
## factor(DAY_OF_WEEK)7 6.164e+00 6.045e-01 10.198 < 2e-16 ***
## DISTANCE -1.530e-03 2.892e-04 -5.290 1.22e-07 ***
## hour_cat6 to 12 -6.894e+01 1.214e+00 -56.787 < 2e-16 ***
## hour_cat12 to 18 -6.721e+01 1.191e+00 -56.447 < 2e-16 ***
## hour_cat18 to 24 -5.716e+01 1.196e+00 -47.781 < 2e-16 ***
## NORTHEASTyes 1.478e+00 2.229e+00 0.663 0.507421
## MIDWESTyes -6.304e-01 2.238e+00 -0.282 0.778186
## SOUTHyes -9.811e-01 2.208e+00 -0.444 0.656834
## WESTyes -3.180e+00 2.212e+00 -1.437 0.150583
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 81.45 on 240975 degrees of freedom
## Multiple R-squared: 0.04512, Adjusted R-squared: 0.04501
## F-statistic: 437.9 on 26 and 240975 DF, p-value: < 2.2e-16
lsmeans(delay_winter.lm, ~ OP_UNIQUE_CARRIER )
## OP_UNIQUE_CARRIER lsmean SE df lower.CL upper.CL
## AA 75.47479 2.290322 240975 70.98582 79.96376
## AS 71.99368 2.457574 240975 67.17691 76.81046
## B6 74.41697 2.387680 240975 69.73718 79.09676
## DL 83.71220 2.278869 240975 79.24567 88.17872
## EV 97.56347 2.323690 240975 93.00910 102.11784
## F9 79.05717 2.479242 240975 74.19792 83.91642
## HA 69.27538 2.770017 240975 63.84622 74.70455
## NK 80.87487 2.457624 240975 76.05799 85.69175
## OO 102.57875 2.287613 240975 98.09509 107.06241
## UA 83.57897 2.306490 240975 79.05831 88.09963
## VX 78.01017 2.595625 240975 72.92282 83.09753
## WN 61.47852 2.262378 240975 57.04432 65.91272
##
## Results are averaged over the levels of: DAY_OF_WEEK, hour_cat, NORTHEAST, MIDWEST, SOUTH, WEST
## Confidence level used: 0.95
# release some memory
rm(delay.lm)
rm(delay_day.lm)
rm(delay_fall.lm)
rm(delay_hour.lm)
## Warning in rm(delay_hour.lm): 找不到对象'delay_hour.lm'
rm(delay_spring.lm)
rm(delay_summer.lm)
rm(delay_winter.lm)
rm(delay2.lm)
rm(Winter)
We summarized the above results (stratified by season) into the table below:
We first see that in general, delays are much shorter during Fall for all carriers. Overall, Hawaiian Airline still show the shortest predicted delays for most seasons (except during Winter, where Alaska seems to be doing better). JetBlue has the longest delay time during Summer. In other seasons, some other carriers seem to have longer predicted delays than JetBlue.
Since there might be extreme values in the delay times, we also wish to dichotimize delays into a binary variable (delaying 15+ minutes vs. delaying < 15 minutes or no delay) and see how these carriers perform, using logistic regression models:
# predictor: carrier (adjusted)
# Spring
Spring <- df %>%
filter (SPRING=="yes")
logit_model <- glm(DEP_DEL15 ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) + DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST, data=Spring, family = "binomial")
## Warning: glm.fit: algorithm did not converge
summary(logit_model)
##
## Call:
## glm(formula = DEP_DEL15 ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) +
## DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST,
## family = "binomial", data = Spring)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## 2.409e-06 2.409e-06 2.409e-06 2.409e-06 2.409e-06
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.657e+01 1.069e+04 0.002 0.998
## OP_UNIQUE_CARRIERAS -3.866e-09 5.024e+03 0.000 1.000
## OP_UNIQUE_CARRIERB6 -1.126e-09 3.125e+03 0.000 1.000
## OP_UNIQUE_CARRIERDL -1.738e-09 2.544e+03 0.000 1.000
## OP_UNIQUE_CARRIEREV 3.649e-09 3.228e+03 0.000 1.000
## OP_UNIQUE_CARRIERF9 -3.154e-09 5.532e+03 0.000 1.000
## OP_UNIQUE_CARRIERHA -8.558e-09 9.186e+03 0.000 1.000
## OP_UNIQUE_CARRIERNK 1.272e-08 4.293e+03 0.000 1.000
## OP_UNIQUE_CARRIEROO -6.548e-09 3.027e+03 0.000 1.000
## OP_UNIQUE_CARRIERUA -3.160e-09 2.932e+03 0.000 1.000
## OP_UNIQUE_CARRIERVX -4.949e-09 5.188e+03 0.000 1.000
## OP_UNIQUE_CARRIERWN -4.270e-09 2.345e+03 0.000 1.000
## MONTH -2.763e-09 8.438e+02 0.000 1.000
## factor(DAY_OF_WEEK)2 1.699e-09 2.603e+03 0.000 1.000
## factor(DAY_OF_WEEK)3 2.947e-09 2.487e+03 0.000 1.000
## factor(DAY_OF_WEEK)4 -2.583e-10 2.432e+03 0.000 1.000
## factor(DAY_OF_WEEK)5 2.419e-09 2.415e+03 0.000 1.000
## factor(DAY_OF_WEEK)6 2.439e-08 2.786e+03 0.000 1.000
## factor(DAY_OF_WEEK)7 -1.888e-09 2.558e+03 0.000 1.000
## DISTANCE 8.025e-13 1.191e+00 0.000 1.000
## hour_cat6 to 12 -8.322e-09 4.799e+03 0.000 1.000
## hour_cat12 to 18 -5.579e-09 4.674e+03 0.000 1.000
## hour_cat18 to 24 -5.768e-09 4.678e+03 0.000 1.000
## NORTHEASTyes -9.432e-09 9.070e+03 0.000 1.000
## MIDWESTyes -4.928e-09 9.143e+03 0.000 1.000
## SOUTHyes -5.917e-09 9.003e+03 0.000 1.000
## WESTyes -8.009e-09 9.036e+03 0.000 1.000
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 0.0000e+00 on 269515 degrees of freedom
## Residual deviance: 1.5636e-06 on 269489 degrees of freedom
## AIC: 54
##
## Number of Fisher Scoring iterations: 25
exp(coef(logit_model))
## (Intercept) OP_UNIQUE_CARRIERAS OP_UNIQUE_CARRIERB6
## 344742669341 1 1
## OP_UNIQUE_CARRIERDL OP_UNIQUE_CARRIEREV OP_UNIQUE_CARRIERF9
## 1 1 1
## OP_UNIQUE_CARRIERHA OP_UNIQUE_CARRIERNK OP_UNIQUE_CARRIEROO
## 1 1 1
## OP_UNIQUE_CARRIERUA OP_UNIQUE_CARRIERVX OP_UNIQUE_CARRIERWN
## 1 1 1
## MONTH factor(DAY_OF_WEEK)2 factor(DAY_OF_WEEK)3
## 1 1 1
## factor(DAY_OF_WEEK)4 factor(DAY_OF_WEEK)5 factor(DAY_OF_WEEK)6
## 1 1 1
## factor(DAY_OF_WEEK)7 DISTANCE hour_cat6 to 12
## 1 1 1
## hour_cat12 to 18 hour_cat18 to 24 NORTHEASTyes
## 1 1 1
## MIDWESTyes SOUTHyes WESTyes
## 1 1 1
# predictor: carrier (adjusted)
# Summer
rm(Spring)
Summer <- df %>%
filter (SUMMER=="yes")
logit_model <- glm(DEP_DEL15 ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) + DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST, data=Summer, family = "binomial")
## Warning: glm.fit: algorithm did not converge
summary(logit_model)
##
## Call:
## glm(formula = DEP_DEL15 ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) +
## DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST,
## family = "binomial", data = Summer)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## 2.409e-06 2.409e-06 2.409e-06 2.409e-06 2.409e-06
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.657e+01 1.003e+04 0.003 0.998
## OP_UNIQUE_CARRIERAS -1.111e-08 4.817e+03 0.000 1.000
## OP_UNIQUE_CARRIERB6 1.562e-10 2.816e+03 0.000 1.000
## OP_UNIQUE_CARRIERDL 1.096e-09 2.393e+03 0.000 1.000
## OP_UNIQUE_CARRIEREV 6.147e-09 3.229e+03 0.000 1.000
## OP_UNIQUE_CARRIERF9 8.717e-10 4.641e+03 0.000 1.000
## OP_UNIQUE_CARRIERHA -1.534e-08 1.023e+04 0.000 1.000
## OP_UNIQUE_CARRIERNK -1.273e-10 4.024e+03 0.000 1.000
## OP_UNIQUE_CARRIEROO -1.895e-08 2.708e+03 0.000 1.000
## OP_UNIQUE_CARRIERUA 2.784e-16 2.602e+03 0.000 1.000
## OP_UNIQUE_CARRIERVX -1.130e-08 5.435e+03 0.000 1.000
## OP_UNIQUE_CARRIERWN -4.093e-12 2.042e+03 0.000 1.000
## MONTH 1.008e-08 7.753e+02 0.000 1.000
## factor(DAY_OF_WEEK)2 2.778e-08 2.344e+03 0.000 1.000
## factor(DAY_OF_WEEK)3 2.765e-08 2.334e+03 0.000 1.000
## factor(DAY_OF_WEEK)4 2.720e-08 2.212e+03 0.000 1.000
## factor(DAY_OF_WEEK)5 2.745e-08 2.197e+03 0.000 1.000
## factor(DAY_OF_WEEK)6 2.691e-08 2.454e+03 0.000 1.000
## factor(DAY_OF_WEEK)7 2.784e-08 2.377e+03 0.000 1.000
## DISTANCE 1.305e-12 1.109e+00 0.000 1.000
## hour_cat6 to 12 4.340e-09 3.964e+03 0.000 1.000
## hour_cat12 to 18 -1.283e-08 3.835e+03 0.000 1.000
## hour_cat18 to 24 2.094e-09 3.824e+03 0.000 1.000
## NORTHEASTyes 3.556e-10 7.615e+03 0.000 1.000
## MIDWESTyes -6.186e-09 7.670e+03 0.000 1.000
## SOUTHyes 8.440e-10 7.543e+03 0.000 1.000
## WESTyes 1.235e-08 7.572e+03 0.000 1.000
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 0.0000e+00 on 322372 degrees of freedom
## Residual deviance: 1.8703e-06 on 322346 degrees of freedom
## AIC: 54
##
## Number of Fisher Scoring iterations: 25
exp(coef(logit_model))
## (Intercept) OP_UNIQUE_CARRIERAS OP_UNIQUE_CARRIERB6
## 344742560264 1 1
## OP_UNIQUE_CARRIERDL OP_UNIQUE_CARRIEREV OP_UNIQUE_CARRIERF9
## 1 1 1
## OP_UNIQUE_CARRIERHA OP_UNIQUE_CARRIERNK OP_UNIQUE_CARRIEROO
## 1 1 1
## OP_UNIQUE_CARRIERUA OP_UNIQUE_CARRIERVX OP_UNIQUE_CARRIERWN
## 1 1 1
## MONTH factor(DAY_OF_WEEK)2 factor(DAY_OF_WEEK)3
## 1 1 1
## factor(DAY_OF_WEEK)4 factor(DAY_OF_WEEK)5 factor(DAY_OF_WEEK)6
## 1 1 1
## factor(DAY_OF_WEEK)7 DISTANCE hour_cat6 to 12
## 1 1 1
## hour_cat12 to 18 hour_cat18 to 24 NORTHEASTyes
## 1 1 1
## MIDWESTyes SOUTHyes WESTyes
## 1 1 1
# predictor: carrier (adjusted)
# Fall
rm(Summer)
Fall <- df %>%
filter (FALL=="yes")
logit_model <- glm(DEP_DEL15 ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) + DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST, data=Fall, family = "binomial")
## Warning: glm.fit: algorithm did not converge
summary(logit_model)
##
## Call:
## glm(formula = DEP_DEL15 ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) +
## DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST,
## family = "binomial", data = Fall)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## 2.409e-06 2.409e-06 2.409e-06 2.409e-06 2.409e-06
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.657e+01 1.707e+04 0.002 0.999
## OP_UNIQUE_CARRIERAS -3.192e-09 5.843e+03 0.000 1.000
## OP_UNIQUE_CARRIERB6 2.512e-09 3.983e+03 0.000 1.000
## OP_UNIQUE_CARRIERDL -4.367e-08 3.340e+03 0.000 1.000
## OP_UNIQUE_CARRIEREV 2.736e-09 4.357e+03 0.000 1.000
## OP_UNIQUE_CARRIERF9 5.362e-09 5.960e+03 0.000 1.000
## OP_UNIQUE_CARRIERHA -1.147e-08 1.044e+04 0.000 1.000
## OP_UNIQUE_CARRIERNK 2.481e-09 5.625e+03 0.000 1.000
## OP_UNIQUE_CARRIEROO 5.777e-09 3.525e+03 0.000 1.000
## OP_UNIQUE_CARRIERUA 3.993e-10 3.550e+03 0.000 1.000
## OP_UNIQUE_CARRIERVX -2.829e-09 6.595e+03 0.000 1.000
## OP_UNIQUE_CARRIERWN 4.467e-09 2.867e+03 0.000 1.000
## MONTH 7.707e-09 1.071e+03 0.000 1.000
## factor(DAY_OF_WEEK)2 3.043e-08 3.142e+03 0.000 1.000
## factor(DAY_OF_WEEK)3 2.877e-08 3.110e+03 0.000 1.000
## factor(DAY_OF_WEEK)4 2.941e-08 2.945e+03 0.000 1.000
## factor(DAY_OF_WEEK)5 2.994e-08 2.875e+03 0.000 1.000
## factor(DAY_OF_WEEK)6 2.921e-08 3.486e+03 0.000 1.000
## factor(DAY_OF_WEEK)7 3.004e-08 2.941e+03 0.000 1.000
## DISTANCE 1.222e-12 1.464e+00 0.000 1.000
## hour_cat6 to 12 2.953e-09 6.758e+03 0.000 1.000
## hour_cat12 to 18 3.353e-09 6.640e+03 0.000 1.000
## hour_cat18 to 24 -2.624e-08 6.659e+03 0.000 1.000
## NORTHEASTyes -7.395e-09 1.127e+04 0.000 1.000
## MIDWESTyes -7.153e-09 1.129e+04 0.000 1.000
## SOUTHyes -2.639e-08 1.115e+04 0.000 1.000
## WESTyes -8.377e-09 1.118e+04 0.000 1.000
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 0.0000e+00 on 180953 degrees of freedom
## Residual deviance: 1.0498e-06 on 180927 degrees of freedom
## AIC: 54
##
## Number of Fisher Scoring iterations: 25
exp(coef(logit_model))
## (Intercept) OP_UNIQUE_CARRIERAS OP_UNIQUE_CARRIERB6
## 344742568285 1 1
## OP_UNIQUE_CARRIERDL OP_UNIQUE_CARRIEREV OP_UNIQUE_CARRIERF9
## 1 1 1
## OP_UNIQUE_CARRIERHA OP_UNIQUE_CARRIERNK OP_UNIQUE_CARRIEROO
## 1 1 1
## OP_UNIQUE_CARRIERUA OP_UNIQUE_CARRIERVX OP_UNIQUE_CARRIERWN
## 1 1 1
## MONTH factor(DAY_OF_WEEK)2 factor(DAY_OF_WEEK)3
## 1 1 1
## factor(DAY_OF_WEEK)4 factor(DAY_OF_WEEK)5 factor(DAY_OF_WEEK)6
## 1 1 1
## factor(DAY_OF_WEEK)7 DISTANCE hour_cat6 to 12
## 1 1 1
## hour_cat12 to 18 hour_cat18 to 24 NORTHEASTyes
## 1 1 1
## MIDWESTyes SOUTHyes WESTyes
## 1 1 1
# predictor: carrier (adjusted)
# Winter
rm(Fall)
Winter <- df %>%
filter (WINTER=="yes")
logit_model <- glm(DEP_DEL15 ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) + DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST, data=Winter, family = "binomial")
## Warning: glm.fit: algorithm did not converge
summary(logit_model)
##
## Call:
## glm(formula = DEP_DEL15 ~ OP_UNIQUE_CARRIER + MONTH + factor(DAY_OF_WEEK) +
## DISTANCE + hour_cat + NORTHEAST + MIDWEST + SOUTH + WEST,
## family = "binomial", data = Winter)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## 2.409e-06 2.409e-06 2.409e-06 2.409e-06 2.409e-06
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.657e+01 1.110e+04 0.002 0.998
## OP_UNIQUE_CARRIERAS 2.619e-06 5.006e+03 0.000 1.000
## OP_UNIQUE_CARRIERB6 2.754e-06 3.459e+03 0.000 1.000
## OP_UNIQUE_CARRIERDL 2.587e-06 2.853e+03 0.000 1.000
## OP_UNIQUE_CARRIEREV 2.675e-06 3.521e+03 0.000 1.000
## OP_UNIQUE_CARRIERF9 3.039e-06 5.219e+03 0.000 1.000
## OP_UNIQUE_CARRIERHA 2.089e-06 7.447e+03 0.000 1.000
## OP_UNIQUE_CARRIERNK 2.808e-06 4.838e+03 0.000 1.000
## OP_UNIQUE_CARRIEROO 2.876e-06 3.113e+03 0.000 1.000
## OP_UNIQUE_CARRIERUA 2.727e-06 3.110e+03 0.000 1.000
## OP_UNIQUE_CARRIERVX 2.302e-06 6.175e+03 0.000 1.000
## OP_UNIQUE_CARRIERWN 2.897e-06 2.532e+03 0.000 1.000
## MONTH 7.692e-09 1.457e+02 0.000 1.000
## factor(DAY_OF_WEEK)2 -7.604e-08 2.683e+03 0.000 1.000
## factor(DAY_OF_WEEK)3 -7.240e-08 2.728e+03 0.000 1.000
## factor(DAY_OF_WEEK)4 -3.366e-08 2.626e+03 0.000 1.000
## factor(DAY_OF_WEEK)5 -5.111e-08 2.552e+03 0.000 1.000
## factor(DAY_OF_WEEK)6 -5.306e-08 2.739e+03 0.000 1.000
## factor(DAY_OF_WEEK)7 -3.124e-06 2.643e+03 0.000 1.000
## DISTANCE -7.319e-10 1.264e+00 0.000 1.000
## hour_cat6 to 12 -7.447e-07 5.308e+03 0.000 1.000
## hour_cat12 to 18 -2.055e-07 5.206e+03 0.000 1.000
## hour_cat18 to 24 -2.097e-07 5.230e+03 0.000 1.000
## NORTHEASTyes -3.110e-06 9.747e+03 0.000 1.000
## MIDWESTyes -2.841e-07 9.785e+03 0.000 1.000
## SOUTHyes -4.140e-07 9.655e+03 0.000 1.000
## WESTyes -3.351e-07 9.673e+03 0.000 1.000
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 0.0000e+00 on 241001 degrees of freedom
## Residual deviance: 1.3982e-06 on 240975 degrees of freedom
## AIC: 54
##
## Number of Fisher Scoring iterations: 25
exp(coef(logit_model))
## (Intercept) OP_UNIQUE_CARRIERAS OP_UNIQUE_CARRIERB6
## 3.447435e+11 1.000003e+00 1.000003e+00
## OP_UNIQUE_CARRIERDL OP_UNIQUE_CARRIEREV OP_UNIQUE_CARRIERF9
## 1.000003e+00 1.000003e+00 1.000003e+00
## OP_UNIQUE_CARRIERHA OP_UNIQUE_CARRIERNK OP_UNIQUE_CARRIEROO
## 1.000002e+00 1.000003e+00 1.000003e+00
## OP_UNIQUE_CARRIERUA OP_UNIQUE_CARRIERVX OP_UNIQUE_CARRIERWN
## 1.000003e+00 1.000002e+00 1.000003e+00
## MONTH factor(DAY_OF_WEEK)2 factor(DAY_OF_WEEK)3
## 1.000000e+00 9.999999e-01 9.999999e-01
## factor(DAY_OF_WEEK)4 factor(DAY_OF_WEEK)5 factor(DAY_OF_WEEK)6
## 1.000000e+00 9.999999e-01 9.999999e-01
## factor(DAY_OF_WEEK)7 DISTANCE hour_cat6 to 12
## 9.999969e-01 1.000000e+00 9.999993e-01
## hour_cat12 to 18 hour_cat18 to 24 NORTHEASTyes
## 9.999998e-01 9.999998e-01 9.999969e-01
## MIDWESTyes SOUTHyes WESTyes
## 9.999997e-01 9.999996e-01 9.999997e-01
We summarized the results (stratified by season) for logistic regressions into the table below:
Similar to findings in the linear regressions results, here we also see that when compared to American Airline, the odds of delaying 15+ minutes for Hawaiian Airline is the lowest across all seasons, and the odds of delaying 15+ minutes is highest among either JetBlue or Virgin America (Depending on seasons).